Skip to content

[wit-parser] Migrate to structured errors in the AST/package parser#2465

Open
PhoebeSzmucer wants to merge 7 commits intobytecodealliance:mainfrom
PhoebeSzmucer:structured-parsing-errors
Open

[wit-parser] Migrate to structured errors in the AST/package parser#2465
PhoebeSzmucer wants to merge 7 commits intobytecodealliance:mainfrom
PhoebeSzmucer:structured-parsing-errors

Conversation

@PhoebeSzmucer
Copy link
Contributor

This PR replaces anyhow error handling in the AST/parser laye rwith structured PackageParseErrors and PackageParseErrorKind types. Errors now carry span information directly, enabling better error diagnostics and better integration with downstream tools that use wit-parser programatically.

Part of a larger change: #2460 (see that link for API discussion).

}

#[derive(Debug)]
pub enum Error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thsi went away now in favor of just using PackageParseError


#[non_exhaustive]
#[derive(Debug, PartialEq, Eq)]
pub enum PackageParseErrorKind {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note - I named it "package parse", because in the future we might want to expose a dedicated error for the "file parse" stage (a single WIT package can consist of many files).

Parsing is also usually thought of as converting a single file into an AST, so I think mentioning that this is about parsing a whole package makes things a bit clearer.

///
/// On failure returns `Err((self, e))` so the caller can use the source
/// map for error formatting if needed.
pub fn parse(self) -> Result<UnresolvedPackageGroup, (Self, PackageParseErrors)> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we're now returning the source map as well. Otherwise the consumers would need to clone the source map if they want to use it for resolving spans (since parse consumes self).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it'd be reasonable to change this signature to &self to avoid the need for that? That's a bit more idiomatic and I'm not seeing off the top of my head why self is used here vs &self

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason it takes self is that UnresolvedPackageGroup takes the ownership of source_map: SourceMap, so we'd either have to clone or introduce a lifetime.

SourceMap contains the contents of multiple files so it might not be very cheap to clone.

I will investigate if UnresolvedPackageGroup can just contain a reference to a source map.

/// Runs `f` and, on error, attempts to add source highlighting to resolver
/// error types that still use `anyhow`. Only needed until the resolver is
/// migrated to structured errors.
pub(crate) fn rewrite_error<F, T>(&self, f: F) -> anyhow::Result<T>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will go away once the resolver is also migrated to structured errors.

@@ -1,4 +1,4 @@
name `nonexistent` is not defined
name `nonexistent` does not exist
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we were doing a mix of "does not exist" and "is not defined". I'm going to standarize on "does not exist".

Once the structured error migration is done we might want to audit all error messages and see if anything could be better (now that we can track rich metadata easily).

@PhoebeSzmucer PhoebeSzmucer marked this pull request as ready for review March 13, 2026 10:36
@PhoebeSzmucer PhoebeSzmucer requested a review from a team as a code owner March 13, 2026 10:36
@PhoebeSzmucer PhoebeSzmucer requested review from alexcrichton and removed request for a team March 13, 2026 10:36
@alexcrichton
Copy link
Member

I apologize I've been extra busy the past few days and I'm about to get even busier traveling for a conference + then a vacation. I wanted to at least comment and say that I haven't forgotten about this and I'll try to get to it when I can. Other maintainers can certainly take a look too, but @PhoebeSzmucer wanted to at least let you know I may take a bit.

@PhoebeSzmucer
Copy link
Contributor Author

@alexcrichton that's completely fine - I'm actually also on vacation now, plus this PR and the LSP is just a side project for me, so it can definitely wait.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for your work on this! Overall looks good, but some comments on idioms/style below

Comment on lines +1275 to +1280
return Err(PackageParseErrors::from(
PackageParseErrorKind::Syntax {
span: f.name.span,
message: "duplicate constructors".to_owned(),
},
));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshedding this slightly, one option here might be:

return Err(PackageParseErrors::new_syntax(f.name.span, "duplicate constructors"))

where the constructor new_syntax takes message: impl Into<String> or somthing like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you want me to do this for all variants? I'm worried about things like:

    ItemNotFound {
        span: Span,
        name: String,
        kind: String,
        hint: Option<String>,
    },

where the order of strings matters (ie PackageParseErrors::item_not_found(span, "A", "B", None) is different to PackageParseErrors::item_not_found(span, "B", "A", None)) and it's easy to accidentally mix it up.

And we'll also have some variants that contain multiple spans (e.g. duplicate package).

kind: &str,
deps: &IndexMap<&'a str, Vec<Id<'a>>>,
) -> Result<Vec<&'a str>, Error> {
) -> Result<Vec<&'a str>, PackageParseErrorKind> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/PackageParseErrorKind/PackageParseErrors/

///
/// On failure returns `Err((self, e))` so the caller can use the source
/// map for error formatting if needed.
pub fn parse(self) -> Result<UnresolvedPackageGroup, (Self, PackageParseErrors)> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it'd be reasonable to change this signature to &self to avoid the need for that? That's a bit more idiomatic and I'm not seeing off the top of my head why self is used here vs &self

let mut map = SourceMap::default();
map.push_str(path, contents);
map.parse()
.map_err(|(map, e)| anyhow::anyhow!("{}", e.highlight(&map)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll generally want to avoid stringifying errors like this since it loses type information -- but isn't the highlighting handled by by parse itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but isn't the highlighting handled by by parse itself?

There are two parses here - SourceMap.parse and UnresolvedPackageGroup.parse. The former is what we call here, and it doesn't do highlights anymore (it returns structured errors which can then be highlighted when needed).

We'll generally want to avoid stringifying errors like this since it loses type information

Correct. I just didn't touch the UnresolvedPackageGroup struct for now. This struct just contains methods that are a wrapper for things like "read file, add string to SourceMap, call SourceMap.parse", ie:

  • UnresolvedPackageGroup::parse - convert path to string, then parse the contents
  • UnresolvedPackageGroup::parse_str - this is a thin wrapper around SourceMap::parse so it can start returning PackageParseErrors
  • UnresolvedPackageGroup::parse_path - calls UnresolvedPackageGroup::parse_dir or UnresolvedPackageGroup::parse_file depending on whether the path is a dir or a file
  • UnresolvedPackageGroup::parse_file - reads the file contents, and then parses them
  • UnresolvedPackageGroup::parse_dir - adds all files in the directory to the source map, then parses it.

If we were to move all of these to return ParseErrors, then ParseErrors` would have to gain variants for bad IO/path operations, and I'm not sure if we want to go there? It might be good to keep parsing totally separate from things like this.

I actually considered deprecating/removing most of these methods, because the consumers can just replicate the functionality themselves. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind adding some brief comments to public types/enums/structs in this file as well as the module itself?

To bikeshed a bit, I find the naming here a bit too wordy. WDYT about something like this:

  • PackageParseErrors => ParseErrors.
  • (future) FileParseErrors => FileErrors
  • (new) type ParseResult<T, E = ParseErrors> = Result<T, E>
  • (new) type LexResult<T, E = lex::Error> = Result<T, E>

That to me keeps the same flexibility for package/file distinctions insofar as there's different classes of errors, but I think it's useful for error ergonomics to not really get in the way when reading/writing code. To that extent I'm worried how verbose the error handling here is, so I'm hoping some renamings/idioms can help reduce the noise without adding too much complexity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. I'll do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants